22 research outputs found

    Flexible, non-parametric modeling using regularized neural networks

    Get PDF
    Non-parametric, additive models are able to capture complex data dependencies in a flexible, yet interpretable way. However, choosing the format of the additive components often requires non-trivial data exploration. Here, as an alternative, we propose PrAda-net, a one-hidden-layer neural network, trained with proximal gradient descent and adaptive lasso. PrAda-net automatically adjusts the size and architecture of the neural network to reflect the complexity and structure of the data. The compact network obtained by PrAda-net can be translated to additive model components, making it suitable for non-parametric statistical modelling with automatic model selection. We demonstrate PrAda-net on simulated data, where we compare the test error performance, variable importance and variable subset identification properties of PrAda-net to other lasso-based regularization approaches for neural networks. We also apply PrAda-net to the massive U.K. black smoke data set, to demonstrate how PrAda-net can be used to model complex and heterogeneous data with spatial and temporal components. In contrast to classical, statistical non-parametric approaches, PrAda-net requires no preliminary modeling to select the functional forms of the additive components, yet still results in an interpretable model representation

    Sources of variation in cell-type RNA-Seq profiles

    Get PDF
    Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. We evaluated different normalization methods, quantified the variance explained by different factors, evaluated the effect on deconvolution of cell type fractions, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We investigated a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is substantial, even for genes specifically selected for deconvolution, and this variation has a confounding effect on deconvolution. Tissue of origin is also a substantial factor, highlighting the challenge of using cell type profiles derived from blood with mixtures from other tissues. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample. Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples

    Dose-response relationships of intestinal organs and excessive mucus discharge after gynaecological radiotherapy

    Get PDF
    Background The study aims to determine possible dose-volume response relationships between the rectum, sigmoid colon and small intestine and the ‘excessive mucus discharge’ syndrome after pelvic radiotherapy for gynaecological cancer. Methods and materials From a larger cohort, 98 gynaecological cancer survivors were included in this study. These survivors, who were followed for 2 to 14 years, received external beam radiation therapy but not brachytherapy and not did not have stoma. Thirteen of the 98 developed excessive mucus discharge syndrome. Three self-assessed symptoms were weighted together to produce a score interpreted as ‘excessive mucus discharge’ syndrome based on the factor loadings from factor analysis. The dose-volume histograms (DVHs) for rectum, sigmoid colon, small intestine for each survivor were exported from the treatment planning systems. The dose-volume response relationships for excessive mucus discharge and each organ at risk were estimated by fitting the data to the Probit, RS, LKB and gEUD models. Results The small intestine was found to have steep dose-response curves, having estimated dose-response parameters: γ : 1.28, 1.23, 1.32, D : 61.6, 63.1, 60.2 for Probit, RS and LKB respectively. The sigmoid colon (AUC: 0.68) and the small intestine (AUC: 0.65) had the highest AUC values. For the small intestine, the DVHs for survivors with and without excessive mucus discharge were well separated for low to intermediate doses; this was not true for the sigmoid colon. Based on all results, we interpret the results for the small intestine to reflect a relevant link. Conclusion An association was found between the mean dose to the small intestine and the occurrence of ‘excessive mucus discharge’. When trying to reduce and even eliminate the incidence of ‘excessive mucus discharge’, it would be useful and important to separately delineate the small intestine and implement the dose-response estimations reported in the study

    Generation and analysis of context-specific genome-scale metabolic models derived from single-cell RNA-Seq data

    Get PDF
    Single-cell RNA sequencing combined with genome-scale metabolic models (GEMs) has the potential to unravel the differences in metabolism across both cell types and cell states but requires new computational methods. Here, we present a method for generating cell-type-specific genome-scale models from clusters of single-cell RNA-Seq profiles. Specifically, we developed a method to estimate the minimum number of cells required to pool to obtain stable models, a bootstrapping strategy for estimating statistical inference, and a faster version of the task-driven integrative network inference for tissues\ua0algorithm for generating context-specific GEMs. In addition, we evaluated the effect of different RNA-Seq normalization methods on model topology and differences in models generated from single-cell and bulk RNA-Seq data. We applied our methods on data from mouse cortex neurons and cells from the tumor microenvironment of lung cancer and in both cases found that almost every cell subtype had a unique metabolic profile. In addition, our approach was able to detect cancer-associated metabolic differences between cancer cells and healthy cells, showcasing its utility. We also contextualized models from 202 single-cell clusters across 19 human organs using data from Human Protein Atlas and made these available in the web portal Metabolic Atlas, thereby providing a valuable resource to the scientific community. With the ever-increasing availability of single-cell RNA-Seq datasets and continuously improved GEMs, their combination holds promise to become an important approach in the study of human metabolism

    Modeling glioblastoma heterogeneity as a dynamic network of cell states

    Get PDF
    Tumor cell heterogeneity is a crucial characteristic of malignant brain tumors and underpins phenomena such as therapy resistance and tumor recurrence. Advances in single-cell analysis have enabled the delineation of distinct cellular states of brain tumor cells, but the time-dependent changes in such states remain poorly understood. Here, we construct quantitative models of the time-dependent transcriptional variation of patient-derived glioblastoma (GBM) cells. We build the models by sampling and profiling barcoded GBM cells and their progeny over the course of 3\ua0weeks and by fitting a mathematical model to estimate changes in GBM cell states and their growth rates. Our model suggests a hierarchical yet plastic organization of GBM, where the rates and patterns of cell state switching are partly patient-specific. Therapeutic interventions produce complex dynamic effects, including inhibition of specific states and altered differentiation. Our method provides a general strategy to uncover time-dependent changes in cancer cells and offers a way to evaluate and predict how therapy affects cell state composition

    DSAVE: Detection of misclassified cells in single-cell RNA-Seq data

    Get PDF
    Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells. We developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. We show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, we foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets

    Digital twins to personalize medicine

    Get PDF
    Personalized medicine requires the integration and processing of vast amounts of data. Here, we propose a solution to this challenge that is based on constructing Digital Twins. These are high-resolution models of individual patients that are computationally treated with thousands of drugs to find the drug that is optimal for the patient

    NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures

    Get PDF
    Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies
    corecore